THE DISTRIBUTION OF PRIME NUMBERS 



K. SOUNDARARAJAN 



What follows is an expanded version of my lectures at the NATO School on Equidis- 
tribution. I have tried to keep the informal style of the lectures. In particular, I have 
sometimes oversimplified matters in order to convey the spirit of an argument. 

Lecture 1: The Cramer model and gaps between consecutive primes 

The prime number theorem tells us that 7r(x), the number of primes below x, is ~ x/ logx. 
Equivalently, if pn denotes the n-th smallest prime number then pn ~ nlogn. What is the 
distribution of the gaps between consecutive primes, Pn+i — Pn"^ 

We have just seen that Pn+i — Pn is approximately logn "on average". How often do 
we get a gap of size 21ogn, say; or of size | logn? One way to make this question precise 
is to fix an interval [a, (3] (with < a < P) and ask for 

(1.1) hm i-#|2<n<iV: ^"+^ ~ € [a, (3]]. 

Does this limit exist, and if so what does it equal? 

Here is another way to formulate this question. Consider intervals of the form [n, n + 
logn] as n ranges over integers up to N. On average such an interval contains one prime. 
But of course some intervals may not contain any prime, and others may contain several. 
Given a non- negative integer k, how often does such an interval contain exactly k primes? 
What is 

(1.2) lim — #{n < : 7r(n + logn) - 7r(n) = A;}? 

N—i-oo N 

Or more generally, for a fixed real number A > we may ask for 

(1.3) lim^#{n<A^: 7r(n + A logn) - 7r(n) = /c}? 

In this lecture we will describe the conjectured answers to these questions, but we confess 
at the outset that no one knows how to prove those conjectures. While conjecturing the 
prime number theorem. Gauss stated that the 'density of primes' around x should be 
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1/loga;. He based his conjecture on extensive numerical investigations. In particular he 
divides the numbers up to three million into intervals of length 100 (a "centad") and 
meticulously tabulates the number of centads with no primes, exactly one prime etc."*^ 
While he does not seem to make a synthesis of his results (except to conjecture the prime 
number theorem) it seems clear that he was seeking to understand a question like (1.3). 
It was left to Harald Cramer [5] to set Gauss's work on a probabilistic footing. 

Cramer's model. The primes behave like independent random variables X{n) (n > 3) 
with X{n) = 1 (the number n is 'prime') with probability 1/logn, and X{n) = (the 
number n is 'composite') with probability 1 — 1/logn. 

Let us suppose that the primes behave like a typical sequence in this random model, and 
answer questions (1.1) and (1.3). We want the 'probability' that Pn+i — Pn lies between 
alogn and piogn. Thus, given the prime Pn, we want Pn + 1, Pn + h — 1 to he 
composite, and Pn + h to be prime, where a log n < h < P log n. According to Cramer's 
model, this occurs with probability 

y- nfi ^ ] ^ V fi-J-l'"'J- 

alogn^fe/JlognM^ ^Ogipn + j) ^Ogipn + h) , „ V logn^ logn 

since log{pn + j) ~ logn as pn ~ nlogn and j <S logn. This is 



logn 



a<h/ logn</3 

for large n, because the LHS looks like a Riemann sum approximation to the integral in 
the RHS. This is the conjectured answer to question (1.1): the probability "density" of 
finding Pn+i —Pn close to tlogn is e~*. This is an example of what is known as a "Poisson 
process" in the probability literature, see Feller [7] . 

Exercise 1. Show similarly that the Cramer model predicts that the answer to question 
(1-3) is . This is the Poisson distribution with parameter A. 

The reader may well object that these predictions are dubious: clearly the proba- 
bility that n and n + 1 arc both primes must be zero, but the Cramer model assigns 
this event a probability l/(lognlog(n + 1)). More generally, suppose we are given a set 
Ti — {/ii, . . . , hk} of k distinct integers, and we ask for the number of integers n < x with 
n + hi, n + /i2, . . . , n + hk all being prime. The Cramer model would predict an answer 
of ~ a;/(loga;)'^, but clearly we must take into account arithmetic properties of the set H. 
For example, if there were a prime p such that the integers hi, . . . , hk occupied all the 
residue classes (mod p) then the integers n + hi, . . . , n + hk would also occupy all the 
residue classes (mod p). In particular one of these numbers would be a multiple of p, and 
so there can only be finitely many values of n with n + hi, . . . , n + hk sll being prime. 



^We refer the reader to www. math. princeton.edu/~ytschink/. gauss for scans of Gauss's manuscripts 
showing these calculations. 
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In [17] Hardy and Littlewood proposed the prime fc-tuple conjecture that 



(1.4) i^in < X : n + hi,n + h2, ... ,n + prime} ~ 



X 



(logx)*^ ' 



for a certain constant GiTi-) called the 'singular series.' The constant GiTi) equals if 
the elements hi, . . . , hk occupy a complete set of residue classes (mod p) for some prime 
p, and is positive otherwise. We will describe this conjecture in more detail below. 

The aim of this lecture is to describe a beautiful calculation of Gallagher [9] which shows 
that the Hardy-Littlewood conjecture (1.4) implies the same distribution of gaps between 
primes predicted by the Cramer random model. The crux of his proof is that although 
6(7i) is not always 1 (as the Cramer model would have), it is ~ 1 on average over all 
fc-element sets Ti, with the hj < h. 

The Hardy-Littlewood Conjecture. We now motivate the Hardy-Littlewood conjec- 
ture (1.4) and describe the singular series ©(7i) that arises there. As a toy model for prime 
numbers let us fix an integer q and consider the reduced residue classes (mod q). Out of 
the q total residue classes, there are (j){q) reduced classes, and we may think of 4>{q)/q as the 
'probability' of a class being reduced. Now suppose we are given the set Ti = {hi, . . . , /i^} 
and we ask for the number of n (mod q) such that n + hi, . . . , n + hk are all coprime to q. 
For convenience, let us just think of square-free q. If these k events were independent then 
the answer would be q{4>{q) / q)'^ ■ The correct answer is a little different: for each prime p 
that divides q we need n to avoid the residue classes —hi, . . . , —hk (mod p). Let I'nip) 
denote the number of distinct residue classes occupied by TC (mod p) . Thus n must lie in 
one of p — !/-}{ (p) residue classes (mod p) . Using the Chinese remainder theorem we see 
easily that the correct answer is 

n(p - ^ ,n (- ^) = ^Cru (1 - ^) (- i) 

p\q p\q p\q 

Let US write &{H; q) — Ylp\q{^ — "'^^^^ )(1 — '^)~^ ■ We have seen that the answer for the 
number of n (mod q) with n-\- hi, . . . , n + h^ all being coprime to q involves correcting 
the guess qi4>iq)/q))^ by the factor &{H; q) which keeps track of the arithmetic properties 
of the set Ti. Now let us consider what happens when we take q = q£ = Y[p<eP ^ 
go to infinity. As ^ — > oo we see that 



e(«;*)-n(i-^)(i-i) 



The infinite product above converges because if p is larger than all the /ij's then i^nip) — k 
and so (l-z^-w(p)/p)(l-l/p)~'' = {l-k/p){l-l/p)~'^ = l + 0{p~'^). This infinite product 
is the singular series^: 

(1.5) 6(«):=n(l-^)(l-i)"'. 



^The term arises from Hardy and Littlewood's original derivation of their conjecture using the circle 
method. Here &(J~L) arose as a series rather than the product given above. 
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Further ifn + hi, . . . , n + h^ are coprime to qi with £ large, then they have no small prime 
divisors, and may reasonably be viewed as a kind of approximation to primes. Thus in 
formulating a conjecture on the number of n < a; with n + hi, . . . , n + hk being prime, a 
natural guess is to take the random answer x/(logx)'^, and multiply it by the arithmetical 
correction factor ©(7i). This is precisely the Hardy-Littlewood conjecture (1.4). It is 
immediate from (1.5) that &{H) = if and only if H exhausts a compete set of residue 
classes (mod p) for some p. 

Gallagher's calculation. We will now describe Gallagher's argument, using the Hardy- 
Littlewood conjecture (1.4) to justify the distribution of gaps between primes predicted by 
the Cramer model. The precise problem we consider is a close variant of question (1.3). 
Let A be a positive real number, and let N be large. We set h = XlogN and seek to 
understand the distribution of 7r(n + h) — 7r{n) as n varies over the natural numbers below 
N. To understand this quantity, we consider the moments 

(1-6) ^EW"+^)-^(^)r = ^E ( E i)'' 

n<N n<N i=l 

n+£ prime 

where r is a natural number. If the Cramer prediction is right, then we may expect these 
moments to be approximately 

(1-7) ^e( E (E^ln + Q)'). 

2<n<N e=l 

where E denotes expectation, and the X(n)'s are independent random variables as in 
Cramer's model. If these moments are roughly equal for r < R for any R — R{N) 
tending to infinity, then we would know that 7r{n + h) — 7r{n) has a Poisson distribution 
with parameter A. This is because of a well-known principle from probability, that nice 
distributions including the Poisson distribution are determined by their moments. 

Let us expand out the r-th powers in (1.6) and (1.7). We then get numbers £i, . . . , ir 
below h not necessarily distinct and would like to understand how often n + £i, . . . , n + £r 
are all prime (for (1.6)), or to understand E(X(n + £i) • ■ ■X{n + £r)) (for (1.7)). Let us 
suppose that there are exactly k distinct numbers among the £i, . . . , £r and write these 
distinct numbers as (1 <)/ii < /i2 < • • • < hk{< h). The number of choices for £i, . . . , £r 
that lead to the same ordered set of distinct numbers hi, . . . , hk is the number of different 
ways of mapping {1,2,... , r} onto {1, . . . , k}; let us denote this^ by a{r, k). Thus we see 
that (1.6) may be written as 

(1.8) ±a(r,k) E (^ E 

k=l l<hi<h2<---<hk<h n<N 

n+hi,... ,n+hk prime 



^This is a 'Stirling number of the second kind.' 
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while (1.7) may be written as 

(1.9) J]cT(r,fc) Yl (]V S E{X{n + h^)■■■X{n + hk))). 



k=l l<hi<h2<---<hk<h 2<n<N 



Since the same quantity appears in both (1.8) and (1.9) and is non-negative, we don't need 
to worry about what a{r, k) is. 

Invoking the Hardy-Littlewood conjecture (1.4)* we get that (1.8) is 



r 



a{r, k) 
[log 

Clearly the quantity in (1.9) is 



E(ii^ E e({.....,M). 

k=l ^ ^ ' l<hi<h2<:.<hk<h 



^ V^iTi^ V 1 

^(logiV)'= ^ 

k=l ^ ^ ' l<hi<h2<:.<hk<h 

Thus, to show that (1.6) and (1.7) are approximately equal, we need only show that 

(1.10) Yl 6({/^i,...,M)~ E 1- 

l<hi<h2<---<hk<h l<hi<h2<---<hk<h 

This is Gallagher's crucial result in [9]. It shows that although the Hardy-Littlewood 
probabilities are different from the Cramer probabilities, on average they are roughly equal. 
This explains why the Cramer model makes accurate predictions for the distribution of 
primes in such short intervals. 

Exercise 2. For a primep put &{7i;p) = (l — vji{j))/p){l — l/p)~^. Prove that as h ^ oo 

l<hi<h2<---<hk<h l<hi<h2<---<hk<h 

Explain why this morally implies (1.10); better still prove (1.10) rigorously (or read Gal- 
lagher's argument [9]). 

Exercise 3. We have sketched how the Hardy-Littlewood conjecture implies that for a 
given positive real number X, and a fixed non-negative integer k, 

1 

-T7#{n<N: 7r(n-h AlogiV) - 7r(n) = A;} ~ — e~^. 
N kl 

Deduce that 

^#{2<n<N: Ml e [a, « } ~ e-*. 



^Precisely, we need this conjecture uniformly for all /ii, below h, and for all A; < i? with 

R = R{N) tending slowly to infinity. 
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Proof of (1.10) when k = 2. From the definition (1.5) note that 6({/ii,/i2}) — 
6({0,/j.2 — hi}) and so, letting £ = h2 — hi we see that the LHS of (1.10) is (in the 
case k = 2) 

Yl i) = J2'^i{oj}){h-e). 

£<h l</ii</i2</i e<h 

h2 — hi=£ 

To evaluate the above asymptotically, it is useful to study the generating Dirichlet series 

The definition (1.5) gives that 6({0,£}) = Hpi/l - 1/p)"^ Ilpt^ll -2/p)(l - From 
this, we may see that F{s) converges absolutely in the half-plane Re(s) > 1, and moreover 
in that region has the Euler product 

-w-n((-^)(-rv(-?)"'%-^-;r-^^(-r--)- 

Mutliplying and dividing by (^{s) = Ylpi^ ~ l/p^)~^ we see (with a little calculation) that 
in Re(s) > 1, 

FM = c(.) n (i - + p.-. J- = 

say. The Euler product for G{s) converges absolutely in Re(s) > and so in that region 
we have obtained a meromorphic continuation of F(s) with a simple pole at s = 1 coming 
from the simple pole of C(s) there. We now make use of the formula that for any c > 

27rzX-ioo s(s + l) * \0 ifO<y<l. 

This is easily proved by moving the line of integration to the left if y > 1 and to the right 
if 2/ > 1; the term 1 — 1/y when y > I arises from the residues of the poles at s = and 
s = — 1. Therefore, if c > 1, we see that 

^Eeao,n)(.-D-ge«o,.)^£::(^)"^ 

27^^ Jc-ioo s(s + 1) 

where the interchange of summation and integration is justified by the absolute convergence 
of F{s) in the region Re(s) > 1. To evaluate the contour integral, we shift the line of 
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integration to Re(s) = e > 0. In the region traversed we encounter only a simple pole at 
s = 1 (because of C(s)) and so our integral is 



hRes \\ + TT- / Cis)G{s 



ds 



s{s+l)' 



Since G(l) is easily seen to be 1, the residue above equals G{l)h?' /2 = /j,^/2. By bounding 
C,{s) and G{s) on the line Re(s) = e we may estimate the remaining integral on that line; 
we omit the standard, but technical, details and merely note that this term is 0(h}^'^). 
We conclude that 

J]6({0,n)(/^-^) = y + O(/.^+^)= l + 

This proves (1.10) in the case k = 2. 

Exercise 4. Analyze G{s) further by writing it as C{s+l)H(s), where H{s) is now analytic 
in Re{s) > — |- Evaluating residues, as above, prove that 



with 5 = 1 — 7 — log27r; here 7 is Euler's constant. 

Concluding remarks. Two important consequences of our predictions for the spacings 
between primes are that 

T Pn+l-Pn , y . Pn+l-Pn 

limsup — = 00, and limmr — = U. 

n^oo logn n^oo logn 

Happily both these results have now been proved. The first involves constructing long 
strings of composite numbers, and was first proved by Westzynthius with important re- 
finements due to Erdos and Rankin. The second is a recent breakthrough of Goldston, 
Pintz and Yildinm , see [12]. The reader may consult the survey by Heath-Brown [18] for 
the limsup result and much else besides, and my survey [31] for the liminf result. 

Lecture 2: The distribution of primes in longer intervals 

Cramer's prediction. In the first lecture we considered the distribution of primes in 

intervals of length a constant times the average spacing. We now discuss what happens in 
longer intervals. Precisely, we consider TT{n + h) — 7r(n) for n < A'" and where h/ logN is 
large, but h/N is small. 

Exercise 5. Using Stirling's formula, show that as A gets large, a Poisson distribution 
with parameter A begins to look like a normal distribution with mean A and variance A. 

Thus Cramer's model would suggest that, if h/ \ogN is large but h/N is small, then for 
n < N, 7r(n + h) — ■n{n) has an approximately normal distribution with mean ~ h/logN 
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and variance ~ h/logN. Another way to arrive at this prediction is to calculate the 
moments (note that for most n < N, Yl'e=i 1/ log(n + i) ^ h/logN) 

which we claim is 

if k is even, and if k is odd it is 

^2-1^) « (l^) • 

Exercise 6. Justify (2.1a)-(2.1c) by arguing as follows. For n > 3, set Xo(ri) = 1 — 
1/ logn with probability 1/ logn and Xo(n) — —1/ logn with probability 1 — 1/ logn; that 
is, Xo{n) = X{n) - 1/logn. Note that E{Xo{n)) = 0. Expand 

E ( E ^o(^ + ^))') = ^ E E E(Xo(n + £i)-..Xo(n + 4)). 

2<n<N l<e<h ei,... ,ek<h2<n<N 

The expectation above is zero if any of the £i 's occurs only once among £i, . . . , £k- When k 
is even there is a leading contribution from terms where the ii, . . . , £k contain k/2 distinct 
numbers each occurring twice. 

Calculating the variance via Hardy-Littlewood. However, we do not believe that 
this prediction, given by the Cramer model, is accurate. At this juncture, it is more 
convenient to deal with '0(n + /i) — '^(n), where V'(a;) = Z^n<a; -^('^) with A(n) denoting the 
von Mangoldt function. Note that the prime number theorem is equivalent to t(j{x) ~ x, 
and that the Hardy-Littlewood conjecture (1.4) may be recast as {7i — {hi, . . . , hk} is a 
set of k distinct numbers) 

(2.2) J2Mn + hi)---A{n + hk)^e{n)x. 

n<x 

The Cramer model predicts that 'i(j{n + h) — ipiji) is approximately normal with mean ~ h 
and variance ~ hlogN. 

To see the flaw in this prediction, let us now calculate the variance using the Hardy- 
Littlewood conjectures. Note that 

1 J2 {^{n + h)- ^{n) - /^)' = ^ E ( E + ^)) ' - 2^ E E + ^) + h\ 

n<N n<N Kh n<N Kh 
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The middle term in the RHS above is -2h'^{tlj{N) + 0{h\ogN))/N ~ -2h?. As for the 
first term in the RHS we may square it out, and invoke the Hardy-Littlewood conjecture 
(2.2). If we forget all the error terms, then the above is 

^ E E ^(^ + + 2 E ^})(^ - ^) - 

n<N£<h t<h 

The prime number theorem and partial summation gives that the first term above is ~ 
hilogN — 1), while from Exercise 4 we see that the second term above is ~ h'^-hlogh+Bh. 
So, ignoring all error terms, we conclude that the variance satisfies 



(2.3) 1 Yl (V'l^ + h)- V(n) -hfr.h(^\ogj + B-l), 

n<N 



which is diff'erent from the ~ h log N predicted by Cramer's model. 

Exercise 7. Assume that the Hardy-Littlewood conjecture (2.2) holds in the quantitative 
form 

J2Hn + hi)---A{n + hk) = e{n)x + o{x^+'), 

n<x 

uniformly for k < K, and distinct hj satisfying 1 < hj < x. Using this, obtain (2.3) with 
an error term of 0{h^+' + h'^N-2+^ + h^N'^). Thus, even assuming the quantitative 
Hardy-Littlewood conjectures, one knows (2.3) only for h < N2~^. 

So although the Hardy-Littlewood probabilities and the Cramer probabilities are roughly 
equal on average, significant deviations show up when we consider /i to be a small power 
of N. We believe that (2.3) is the right asymptotic for the variance and the Cramer model 
predicts the wrong answer. 

The variance and zeros of the zeta function. Here is the sketch of a very different 
calculation which leads to the same answer as (2.3). Riemann's explicit formula (see [6]) 
says that 

h negligible terms. 

p ^ 



Here p runs over the non-trivial zeros of the Riemann zeta-function. We assume the 
Riemann hypothesis and write p = ^ -\- i^. The sum over zeros is only conditionally 
convergent, but we will argue loosely omitting such considerations. It follows that 

E(x -{- h)^ — x'^ 
h negligible terms. 



The sum over zeros above is weighted down with a factor and so we may expect that 
large zeros make a minor contribution. It turns out that zeros with |p| > x/h are not so 



10 



K. SOUNDARARAJAN 



important. For the small zeros, we replace {x + h)^ — xf by the Taylor approximation 
phx^~^. Therefore we may expect that 



-/ {^{x + h)-^{x)-hfdx^^ I 

''^ ''^ \l\<X/h 



2 

dx 



(2 4) ^ x-(^.-7.) 2^'^-^^-^'-^-l 

\ii\,n2\<x/h 

There are <^ logT zeros of the zeta- function with ordinates lying between T and T + 1. 
Using this observation in (2.4), and estimating the magnitude of the sums over zeros there, 
we "deduce" that, assuming RH,^ 

(2.5) — / {il;{x + h)-il;{x)-hfdx<^h{l + \ogX/hf. 

A result like this was established by Selberg [30]. If we want an asymptotic in (2.4), 
then we need some understanding of the spacings 71 — 72 between zeros of the Riemann 
zcta-function. Such an understanding is provided by the pair correlation conjecture of 
Montgomery [24], which predicts that these ordinates are distributed like eigenvalues of 
large random matrices. Using such information Mueller obtained an asymptotic formula 
much like (2.3), and Goldston and Montgomery [11] showed conversely that a formula like 
(2.3) also conveys information about the zeros of C(s)- For more discussion on this set of 
ideas consult Goldston's recent survey [10]. 

Higher moments. Recently, Montgomery and I (see [25] and [26]) used a quantitative 
form of the Hardy-Littlewood conjecture (see Exercise 7) to study higher moments of 
•0(n + h) — •0(n) — h. We now describe these results briefly. They support the conjecture 
that if (logAr)i+'^ <h< N^-^ then for n < A?" the distribution of i;{n + h) - i;{n) is 
approximately normal with mean h and variance hlog{N/h). 
We assume that (log A)^+'^ < h < N^~^ and wish to evaluate 

(2.6) l-J2(^(n + h)-i;in)-hr. 

n<N 

For even r we expect that this is ~ 2^72^"/^ (^ log A//i)'"/^, while for odd r we expect 

it to be o{{h\ogN/hy^'^). If we simply expanded {il^{n + h) — ip{n) — hy in powers of 
{'il){n + h) — il^{n)) and h (as we did in the case r = 2) then we would get many terms all 
of size , and a careful cancellation of these and lower order terms is needed before we 
get to the actual delicate main term of size essentially h"^^"^. To circumvent this, we define 
■^o(^) = A(n) — 1, in analogy with Exercise 6. This eliminates the unnecessary higher order 
terms at the outset, and simplifies calculations considerably. For other situations where 
this trick helps, see my paper with Granville [15] in this volume. Using this notation, and 
expanding (2.6) we want to understand 

(2.7) J2 ^ E^o(^ + ^i)---^o(^ + ^-)- 

ei,...,£-r<h n<N 



^In fact we would only deduce ^ h{l + \ogX/h)^ but the extra "log" may be removed by smoothing. 
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Exercise 8. Define the modified singular series 6o(^) by 

Boin) = J2 (-i)'^'"'-^'©!^), so that e{n) = J2 ®o(^)- 

jcn jcn 

Here we understand that 6(0) = 6o(0) = 1- Show that the quantitative Hardy- Littlewood 
conjecture of Exercise 7 is the same as 

J] Ao(n + /ti) • • • Ao(n + hk) = &o{H)x + 0(a;^+^), 

n<x 

keeping the hypotheses there. 

For simplicity, consider first the terms in (2.7) when the li are distinct. If we use the 
asymptotic of Exercise 8 we are led to the problem of evaluating 

E ®o(H), 

hi,... ,hk<h 
hi distinct 

which is a problem analogous to, but more delicate than, Gallagher's calculation (1.10). 
The main result in [26] is the asymptotic 

hi,.h,<h " \ o{{h\oghf/^) if /c is odd. 

hi distinct 

Exercise 9. Show the following refinement of Gallagher's (1.10): 



)• 



hi,... ,hh<h 
hi distinct 



Returning to (2.7), we must analyze the terms when the £i are not necessarily distinct. 
Suppose that hi, h^ are the distinct elements among £i, £r and that each hi 
appears mi{> 1) times among the £i. After a little combinatorics, we may write (2.7) as 

r . X ^ N k 

p-s) E E U:.,Jh. E ^EnAo("+w"". 

A;=l mi,... ,mfc>l ^ ' «/ hi,... ,hk<h n=l i=l 

mi=r hj distinct 

We must distinguish the indices where = 1 and the remaining indices where rrii > 1. 
Let X denote the subset of {1, ... , k} such that = 1 for z G X. For z ^ X (so > 2) 
we think of Ao(n + hi)"^^ as being essentially (log A^)"^*~^A(n + hi): the point is that both 
quantities have about the same expected value (log A?")"^*"^, unlike the case when = 1 
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where the expected value of A(n + hi) and Ao(n + hi) are 1 and respectively. Therefore 
the inner sum over n in (2.9) is essentially 



^^"^y' En^o(n + /iO U {Aoin + h,) + 1) 



n=liGX l<i<k 



(logiV)'^-'^ ^ 



XCJ'C{l,...,fe}n=ljGj' 



Now we invoke the Hardy-Littlewood conjecture of Exercise 8, and use (2.8). 

Exercise 10. Complete the details in evaluating (2.7). Show that when r is odd or any 
of the irii's is > 3 we get a contribution of o{h\ogN)'^/'^ . In the case r is even, the main 
term ^i^j^^jy^ {h log N/h)^/'^ arises from contributions to (2.9) where the rui are all 1 or 
2. 

The proof of (2.8) is quite complicated, and we do not go into it here. Let us however 
point out one important ingredient. While motivating the Hardy-Littlewood conjecture in 
Lecture 1, we considered the toy problem of reduced residues (mod q). If 1 = ai < 02 < 
. . . < < q are the reduced residues below g, then we may ask for the distribution 

of (ai_|_i — ai){(f){q)/q); we have multiplied by (p{q)/q so that this is 1 'on average.' If, for 
example, q is the product of the first £ primes, then as in Lecture 1 we may think of these 
ai as being like primes, and expect that, for < a < 

#{1 < ^ < 0(9) - 1 : (a,+i - a,)^ e [a, p]} ~ cj>{q) f e'^dx. 

A beautiful result of Hooley [20] shows that this holds provided (j){q) jq is small. Obviously, 
some restriction on (t>{q)/q is necessary; for example, if q is prime then clearly Oj_|_i — 0^ = 1 
for 1 < i < (j){q) — 1. Moreover, Montgomery and Vaughan [27] have even estimated the 
moments: 

1 . ^/_A,fc 



n=l l<h ^ 

The proof of (2.8) builds on the techniques developed there. 

In our discussion above we have ignored error terms altogether. If one argues carefully 
using the quantitative Hardy-Littlewood conjectures of Exercises 7 and 8, we can evaluate 
the r-th moment (2.6) provided that h < N^^'^~^. We expect that the same asymp- 
totics hold even when h is larger with h < N^~'^. Thus these arguments suggest that for 
(log A'")-'^^'^ < h < N^~^, the distribution of 'ip{n + h) — il}{n) (for n < N is approximately 
normal with mean h and variance hlogN/h. For numerical support for this conjecture, 
see [3] and [25]. For other work related to this circle of ideas, see [3] and [4]. 
Connections with zeros of Ci^)'^ We mentioned earlier the work of Goldston and 
Montgomery relating the variance of primes in short intervals to the pair correlation of 
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zeros of C{s). Our calculations on the higher moments of primes in short intervals suggest 
that if X > T^+^ then^ 



/2X f, i'2X J. 

x'^j dx = J J2 2cos(7loga;)j dx 



{■y\<T 0<7<T 

is ~ ^^72L.x(2A^(T))^/2 if k is even, and is oiXNiTf^) if k is odd. Here N{T) - 

^logT denotes the number of zeros of C,{s) with < 7 < T. Viewed this way, Mont- 
gomery's pair correlation conjecture may be thought of as saying that for x > T^^^ the 
sum X]o<7<T cos (7 log behaves like a sum of uncorrelated random variables. The higher 
moments suggest that it behaves in fact like a sum of independent random variables.^ 
These statements are quite vagiie, and it would be nice to flesh out the precise connection 
between these higher moments and zeros of C,{^)- For other connections between zeros of 

and Hardy-Littlewood type conjectures see [2]. 
Chebyshev's bias. We have considered above the distribution of primes in short intervals. 
What happens to the distribution in long intervals [1,2;]? That is what can be said about 
the distribution of ip^x) — x. Assuming RH, we get from Riemann's explicit formula that 
this is essentially —2x2 Re X]o<7 ^*'^/ (1/2+^7)- It is expected^ that the zeros of C,{s) are all 
simple, and have no non-trivial Q-linear relations among them. In that case the sum over 
zeros above may be modeled by Re ^o<7 ^(7)/(l/2+^7) where the ^(7) are independent 
random variables, taking uniformly distributed values on the unit circle. Precisely, as 
t varies from 1 to T, the distribution of (^/'(e*) — e*)/(2e*/^) is like the distribution of 
our random sum above. ^ This is a certain non-universal distribution, which has been 
investigated in, for example, [23] and [29]. To gain a flavor of this distribution the reader 
may contemplate Xn/n where the Xn are independent random variables taking the 

values ±1 with equal probability. 

The distribution above is symmetric about the origin, and so '^(a;) is as likely to be 
larger than x as it is to be smaller than x. However, iIj{x) — 9{x) + 9{x^) + 6{x^) -\- . . . 
where 6{x) — 'YlpKx^^^P- Thus it is much more likely for 9{x) to be smaller than x than 
for it to be larger than x. By partial summation one gets that ti{x) < Yi{x) much more 
often than 'k{x) > li{x). In fact, in a certain sense the probability that 7r{x) 'beats' li(a;) 
is only 0.00000026 . . . ! We stop here, referring the reader to Rubinstein and Sarnak [29], 
and the delightful recent survey [14] for more information. 

To summarize, we found three distinct behaviors for the distribution of primes in inter- 
vals. At the "microscopic" scale {h x log N) there is Poisson behavior, at the "mesoscopic" 
scale {h/logN — > 00, /j. = o{N)) there is Gaussian (normal) behavior, and at the "macro- 
scopic" scale {h » N) there is a specific non-universal distribution law. Such division into 
three regimes occurs in many other problems as well; for example, in the distribution of 



^We take this opportunity to point out that the important constraint X > T^+^ has been erroneously 
omitted in a similar discussion (on page 594) in [26]. 

'^This is analogous to a result of E. Rains [28] in random matrix theory. 

^There is perhaps no good reason for this belief, except that the contrary situation is harder to imagine! 
®The change of variable x — means that x'^'^ — e^^^ now takes values uniformly on the unit circle as 
t varies. 
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lattice points in the plane. As a starting point, we refer the reader to the recent paper of 
Hughes and Rudnick [21] and to the references therein. 

Lecture 3: Maier's method and an "uncertainty principle" 

If the Riemann Hypothesis is true, then from Selberg's result (2.5) we easily deduce that 
(for h < N) the number of n < N with \il){n + h) - il){n) - h\ > Vh{logN)^+^ is < 
iV/(logiV)2'5. It follows that if N > h > {logN)^+^ then "almost all" intervals {n,n + h] 
with n < N contain about the correct number of primes, ~ h/logN. If (2.3) holds then 
we can even conclude that if h > (log A^)^"*"*^ then "almost all" intervals {n,n + h] with 
n < N contain approximately the correct number of primes. In Cramer's model, one can 
show that almost surely ^j;<:n<x+h -^i''^) ~ h/logx if (loga;)^+'^ < h < x. Thus it seems 
quite plausible that if x is large and x > h> (loga;)^+^ then il;{x + h) — tp{x) ~ h. 

The classical prime number theorem with error term xexp{—Cy/logx) tells us that such 
a result holds if h > x exp{—C \/logx) . An important advance was made by Hoheisel who 
showed that the asymptotic '(/^{x + h) — '^{x) ~ h holds if x > h > x^ for some number 
< 1. He was able to take = 1 — 1/33000, but this has been improved subsequently, 
with the best result known, due to Huxley, being 6* = ^ + e. If the Riemann hypothesis is 
true then 9 may be taken as | + e. The arguments pioneered by Hoheisel depend on the 
fact that while we don't know RH, we do know that most zeros of C,{s) lie close to the \ 
line. For a nice account of these results see Heath-Brown [18]. If the asymptotics given for 
(2.6) are true then we may take 6 to be any positive number. 

Thus it seems the conjecture that '0(a; -\- h) — tl){x) h for x > h > (logx)^^^, if 
true, lies quite deep. This conjecture was widely believed until the mid 1980's when 
Maier [22] shattered this belief by showing that for any A > 1 there are arbitrarily large 
X such that the interval (x,x + (loga;)"^] contains significantly more primes than usual 
(that is, > (1 + 5A){^ogx)^~^ primes for some 5a > 0) and also intervals {x,x + (loga;)"^] 
containing significantly fewer primes than usual. In this lecture we will sketch Maier's 
ingenious method, and describe some extensions of his idea. The reader may also consult 
[13] for another exposition of related ideas. 

Maier's "matrix" method. Let x be large, and h be on the scale of a power of log x. Let 
P be an integer which we will eventually take to be the product of many small (below log x) 
primes. Consider the [x/P]-hj-h "matrix" whose (z,j)-th entry is the number {[x/P] + 
i)P + j. Thus the entries of this matrix are numbers lying between x and 2x + h. Note 
that each row of the matrix consists of an interval {[x/P] + i)P to {[x/P] + i)P + h. 
Each column of the matrix consists of an arithmetic progression with common difference 
P: namely, x < n < 2x + h with n = j (mod P). The idea is to count the number of 
primes in this matrix in two ways: by counting the primes row by row, and by counting 
the primes column by column, and then comparing the two answers. If we assume that the 
asymptotic formula for primes in short intervals holds then we get an answer for the row 
by row calculation. The prime number theorem for arithmetic progressions allows us to 
do the column by column calculation. Of course the two answers should match. However 
when h is very small, like a power of log x, there are choices of P for which the answers 
don't match! This leads to Maier's result. 
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Consider the row by row calculation. The number of primes is 
(3.1a) J2 {Tr{nP + h) -7T{nP)), 

x/P<n<2x/P 

and if we assume that intervals of length h contain the right number of primes, this is 

P log X 

Consider next the column by column calculation. If the progression n = j (mod P) is 
to contain primes, we must have {j,P) = 1. In that case the prime number theorem in 
arithmetic progressions would say that such a progression contains a proportion 1 /(f){P) of 
all primes. Of course, in order to use the prime number theorem in arithmetic progressions 
rigorously we must pay attention to the size of the modulus P compared with x. Assuming 
that this is not an issue, we find that the column by column contribution is 

(3.2) i^{'2x + h;PJ)-7r{x;PJ))^—^ ^ ^■ 

tth </>(P)loga; 

If we compare (3.2) and (3.1b) we find the relation 

0(P) 



(3.3) Y 



j<h 

(i,P)=i 

should hold. At first glance, (3.3) is eminently reasonable: the probability that j is coprime 
to P is (j){P)/P. It is even easy to make this precise: write the condition (j, P) = 1 as 
Hi\{j,p) A*(^) and we easily get 

(3.4) ^ = h^ + 0{d{P)), 

3<h 

where d{P) is the number of divisors of P. Thus, if h is just a bit larger than d{P) (which 
is always quite small, that is <^ P'^) then (3.3) will hold. So where is the contradiction? 
The point is that in Maier's application h is very small compared with P, and so (3.4) is 
useless. 

For the purpose of illustration suppose that P is the product of all primes between 
(loga;)To and (loga;)/100. Then, by the prime number theorem, P is about size x^oo+^W, 
For such moduli P we don't know the prime number theorem in arithmetic progressions 
used in (3.2), but such a result does hold if the Riemann hypothesis for Dirichlet L-functions 
is true; let us postpone a discussion of this point. Suppose now that /i is a number of size 
(logx)^ with 2 < 6 < 2.7. By inclusion-exclusion, the LHS of (3.3) is 

^h- > -+ > — 

Z / p Z / J)Q 

(loga;)0-9<p<(logcc)/100 (loga;)°-^<p<q<(loga;)/100 



h(l-\og 



10 1/, 10\2x 
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where we have used the prime number theorem to evaluate ^ 1/p for p between (logx) lo 
and (loga;)/100. On the other hand, by Mertens' theorem, the RHS of (3.3) is 

(logo;) 10 <p<(loga;)/100 

The formula for the LHS has the first three terms in the usual expansion of ^ = e~ ^°si'^^/Q)^ 
so the two answers are certainly close, but obviously they are not equal! Indeed the LHS 
is a little bit larger. 

Exercise 11. Conclude from the above that for any 2 < $ < 2.7 there exist arbitrarily large 
X such that the interval [x,x + (logx)^] contains significantly more primes than expected. 
Taking such an interval and cutting it up into smaller intervals, deduce that the same 
conclusion holds for all 1 < 9 < 2.7. Using the same P as above, and taking four terms in 
the inclusion- exclusion formula, show that if < 3.6 there exist intervals [x,x + (logx)^] 
with significantly fewer primes than expected. In this manner one can proceed for 9 < 8.1, 
just using inclusion- exclusion and easy calculations. Now replace (logx)^-^ in the definition 
of P with (loga;)^""^ and prove Maier's theorem. 

More on the contradiction. Now let us describe a different way of seeing a contradiction 
to (3.3). This method is very fiexible, and works for many choices of P, and also generalizes 
readily. Let y be a large parameter; in the application we may think of y as being some 
power of logx. From each dyadic block [2~^y,2~^^^y] with j < [logy/ (2 log 2)] select 
about half the primes. Take P to be the product of these selected primes. Thus P is 
composed of about half the primes in y] , and there are plenty of choices for P. Let 
w > 1 be a real number, set h = y^ and consider whether (3.3) can hold. We will show 
that for arbitrarily large u the LHS is appreciably larger than the RHS, and for arbitrarily 
large u it is smaller. 

To see this we consider the Dirichlet series Cp{^) = P)=i Plainly we have 

(3-5) cp(«)=c(«)n(i-^)' 

p\p 

so that C,p{s) extends to a meromorphic function in all of C with a simple pole at s = 1. 
The point is that if something like (3.3) holds then C,p{s) must approximately look like 
C,{s)(}){P) / P , and by choosing s appropriately we can obtain a contradiction to (3.5). More 
precisely, set 

(n,P)=l 

Then, for Re(s) > 1, 

n<z n<z 
(n,P) = l 
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upon integrating by parts. Changing variables u = log 2/ logy we obtain that 

(3.6) ^p^s) = Cis)^ + slogy J^^ E{u)y--^^-'Uu. 

To start with, (3.6) is valid for Re(s) > 1, but since E{u) < d{P)y~'^ by (3.4), we see that 
(3.6) makes sense for Res > 0. 

Exercise 12. Let (logj/)/2 > ^ > 1 6e a real number, and take s = 1 — ^ / logy + i7r / logy . 
Using (3.5) prove that 

|C.W|»lf^exp(|i + 0(5i)). 
Then using (3.6) deduce that 



\E{u)\e^-du » ^ exp (| + . 



Show that E{u)e^^du <^1/^, so that in the LHS above both positive and negative values 
of E{u) make roughly equal contributions. 

Morally, Exercise 12 shows that E[u) cannot be too small for large u. To make this 
precise, one also needs an upper bound for E{u) so as to be able to bound the tail of the 
integral in Exercise 12. Developing this argument carefully, one may show that there is 
a positive constant A such that every interval [^(1 — A/logu),u{l + ^/logtt)] contains 
points u± satisfying 



E{u+) > exp(-'u+(logw+ + log log w+ + 0(1))), 



and 

E{u-) < — exp(— tt_(logtt_ + loglogtt_ + 0(1))). 

For more details, see section 3 of [16], especially Corollary 3.3. 

Earlier, wc postponed discussion of the prime number theorem in arithmetic progres- 
sions. We refer the reader to Davenport [6] for an account of this. In Chapter 20 there 
one finds Page's result that ilj{x;q,a) ~ x/(f){q) for all q < exp{Cy^logx) with the possible 
exception of multiples of a particular modulus qi which may depend on a;. If we choose y 
a little less than ^/logx then our moduli P above are below exp(C-\/Iog^) and certainly 
we can find P that are not multiples of the exceptional modulus qi. Thus our appeal to 
the prime number theorem in arithmetic progressions can be made rigorous. 

The flexibility in choosing P is quite useful. Exploiting this, Granville and I (see [16]) 
showed that the asymptotic 

(3.7) i;{x + h)-i;{x) = h + 0{h^+^), 

suggested by Cramer's model, sometimes fails to hold if /i < exp((logx)^~'^). This improves 
work of Hildebrand and Maier [19] who had obtained this result assuming the Generalized 
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Riemann Hypothesis, and a weaker result unconditionally. It seems safe to conjecture that 
(3.7) holds a h> x^, and perhaps it holds when h > exp((loga;) 2+''). 
An uncertainty principle. Maier's method can be adapted to establish limitations to 
the equidistribution of primes in arithmetic progressions. For example, Friedlander and 
Granville [8] proved that for every A> 1 there exist large x and an arithmetic progression 
a (mod q) with (a, q) = 1 and q < a;/(loga:;)'^ such that 



n{x; q, a) 



7r(a:) 'n:{x) 



0(g) 0(g) 



More recently, Balog and Wooley [1] showed that the sequence of integers which may 
be written as the sum of two squares also exhibits "Maier type" irregularities in intervals 
(x, x + {\ogx)^) for any fixed, positive A. Previously Maier's work had seemed inextricably 
linked to the mysteries of primes, but Balog and Wooley's result suggests that such results 
should be part of a more general phenomenon. This has been formalized by Granville 
and me as an "uncertainty principle" for arithmetic sequences. What Maier's argument 
shows is that the primes cannot be simultaneously well distributed in short intervals, and in 
arithmetic progressions. Then a suitable version of the prime number theorem in arithmetic 
progressions is used to remove the second possibility, leaving us with the irregularities of 
distribution in short intervals. The first conclusion of irregularities in short intervals or 
progressions turns out to be a general feature of many interesting arithmetical sequences. 

A rough description of this result is as follows: Let A denote a sequence a(n) of non- 
negative real numbers, and let A{x) = X^^<a;Ci(n). If A is well-distributed in short inter- 
vals, then we may expect that 

(3.8) A{x + y) - A{x) ^ y^^""^ 



X 

To understand the distribution of a(n) in arithmetic progressions we begin with n that 
are multiples of a given number d. We suppose that there is a non-negative multiplicative 
function h such that 

(3.9a) Ad{x) = a{n) ^ ^^A{x). 

n<x 
d\n 

We assume that the asymptotic behavior of 

Aix;q,a):= ^ a(n) 

n<x 
n=a (mod q) 

depends only on the g.c.d. of a and g. Then (3.9a) leads to the prediction that 
(3.9b) A{x;q,a)^ ^^A{x), 
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with 7g = Y[p\qiP~^)/iP~^ip)) ^^cl fq{a) is a certain non-negative multiplicative function 
of a, defined in terms of h, such that fq{a) = fq{{a, q)) so that fq{a) is periodic (mod q). 
We can be flexible in how we want to assume (3.9b); for example, sometimes it is convenient 
to assume it only for q that are coprime to a certain fixed set of primes. 

To illustrate the framework consider the following examples. 
Example 1. Take a{n) = 1 for all n. It is natural to take h{d) — 1 for all d, 7^ = 1, and 
fq{a) = 1. Then (3.9a) and (3.9b) are both good approximations with errors at most 1. 
Example 2. Take a(n) to be the indicator function of the primes. Then h{l) = 1 and 
h{d) = for d > 1. One has = (t){q)/q and /g(a) = 1 if (a, q) = 1 and otherwise. The 
prime number theorem in arithmetic progressions gives (3.9b) for small values of q. The 
result of Friedlander and Granville places restrictions on the approximation (3.9b) when q 
is large. Maier's results place restrictions on (3.9a) for small y. 

Example 3. Take a(n) to be the indicator function of the sums of two squares. The 
multiplicative function h is defined by h{p^) = 1 if = 1 (mod 4) and h{j)^) = 1/p if 
p'^ = 3 (mod 4). Here Balog and Wooley's result places restrictions on (3.9a). 

The main results of [16] give that if h(j)) is not always close to 1 (as in the regular example 
1) then there will be moduli q for which (3.9b) cannot hold. Typically these moduli will 
be large as in the Priedlander-Granville result for primes in progressions. Furthermore, 
either there exist values y larger than an arbitrary power of logx for which (3.9a) is false, 
or there exist small moduli q (below exp((loga;)'')) for which (3.9b) is false. These results 
include the previous results on primes and sums of two squares, and also cover many other 
examples. 

Consider sets containing roughly half of the prime numbers. There are uncountably 
many such sets, and so maybe we can find a set which is very well distributed in arithmetic 
progressions. One amusing example from [16] shows that this cannot be done, and the 
Friedlander-Granville limitations persist for any such set. 

We content ourselves with this vague description of the uncertainty principle, referring 
the reader to [16] for more examples and a precise description of the results. 
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